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Abstract 



5-H ' In this note we study the redundancy aware network design problem with laminar demands. In this 



problem we are given a graph, several source-sink pairs, and a universe of data packets. Each sink desires 
a subset of the packets from its corresponding source. Our goal is to find a collection of paths, one for 
■>!::;j- . each source-sink pair, such that the total cost of routing packets over these paths is minimized. The cost 

of routing on an edge in the graph is proportional to the total size of the distinct packets that the edge 
carries. We assume that the collection of packet sets is laminar, that is, the packet sets desired by any 
rv^ ' two pairs of sources and sinks are either disjoint or one contains the other For this setting, we present 

/^ . a primal-dual based 2-approximation, improving upon a logarithmic approximation due to Barman and 



Chawla 1|4J. 
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1 Introduction 

In a network design problem we are given a graph and a traffic matrix specifying the amount of traffic to 
be sent between pairs of nodes in the graph. The goal is to build a minimum cost network that can support 
the given traffic matrix, as well as to specify a routing for the traffic on that network. The cost of routing 
traffic is a function of how much and what traffic each edge carries. For example, suppose that we can buy 
bandwidth on any edge at a fixed rate, then to minimize costs, we should route the traffic over shortest paths 
between the sources and the sinks. On the other hand, if the cost on an edge depends only on whether or not 
the edge carries traffic, but not on the amount of traffic it carries, then routing traffic along a minimum cost 
Steiner forest is the optimal solution. 

We study a network design problem in the context of a communication network that can leverage the 
redundancy in data to save on routing costs. The input to the problem consists of multiple commodities, 
each with a source and several possible destinations that we collectively call terminals. Each commodity is 
composed of a number of different data packets drawn from a universe 11 of packets; we call these sets of 
packets demands. Importantly, there is redundancy in traffic — different commodities may overlap in the sets 
of packets they contain, and so can benefit from using common routes. In particular, we assume that routers 
in the network can detect and remove duplicate packets, and can also replicate packets. This means that 
an edge carrying two different commodities that share common packets need only transmit each common 
packet once, thereby leading to savings in bandwidth and the cost of routing. Our goal, again, is to find a 
minimum cost routing for the given traffic matrix, assuming that we can buy bandwidth at a fixed rate on 
every edge. Formally, our solution specifies for each commodity a routing tree spanning all of the terminals 
for this commodity. The cost of this solution on any particular edge is proportional to the total size of the 
distinct data packets that the edge carries; in other words, it is a coverage function over the collection of 
commodities that use this edge. This problem was introduced in lU, and is called the Redundancy Aware 
Network Design problem (RAND). 

Redundancy aware network design displays the same short-routes-versus-shared-routes tradeoff present 
in several classical network design problems with nonlinear costs, such as rent-or-buy network design [il2] 
|9l, access network design |J2l, and buy-at-bulk network design ||3][T0l|T3l[l4l. However there are funda- 
mental differences. The buy-at-bulk cost model is inspired by economies of scale in a physical commodity 
network — the volume of traffic that an edge carries is the sum of the volumes that the different commodities 
impose on it and the routing cost on the edge is a concave function of the total volume of traffic. On the 
other hand, in our setting, the volume of traffic itself is lowered due to the inherent nature of data traffic. In 
particular, this means that the savings achieved depend on the contents of the traffic and not just its quantity. 
We not only need to bundle traffic streams as much as we can, but we also need to decide the right sets of 
traffic streams to bundle. Consequently, the approximability of the problem also depends on the extent and 
manner in which different commodities share packets. When every source-sink pair in the network demands 
a distinct packet, that is, there is no data redundancy in the network, the problem reduces to finding the 
shortest route for each pair and can be solved exactly. When all of the demands are identical, the problem 
reduces to finding a single optimal Steiner forest over all of the terminal sets, and can be approximated 
within a factor of 2. 

In this note we focus on a special case of the problem that captures much of its complexity — the laminar 
demands setting. In this setting the packet sets corresponding to the commodities form a laminar family: 
the packet sets of any two commodities are either completely disjoint or one contains the other. There is a 
natural heirarchy over commodities in this setting and any commodity can use for free an edge that is being 
used for another commodity that "dominates" it. So we may favor long routes for a commodity if they share 



edges with a dominating commodity over short ones that do not share edges. Less intuitively, it may be 
useful to pick similar routes for two commodities with disjoint packets sets if a portion of the shared route 
can be used for a commodity that dominates both. 

To form intuition for RAND in the laminar setting consider a setting with k different packets and k + I 
commodities: fori<k the demand set of commodity i contains only packet i, and demand set of commodity 
k + 1 contains all of the k packets. Suppose also that every commodity has a single source and a single 
sink. Then, one approach to solving the problem is to first find a least cost path for commodity k + 1, 
and then find least cost paths for the remaining commodities using the edges in the first path for free. This 
approach misses solutions where a slightly longer path for commodity A; + 1 is much more cost efficient 
for the remaining commodities than the shortest path for A; + 1. An alternative is to first find shortest paths 
for commodities 1 through k, and then find the least cost path for commodity k + I that can use edges in 
previously picked paths at a cheaper cost. This misses solutions where picking slightly longer paths for 
commodities 1 through k leads to a greater sharing of the edges. The first approach is indeed the approach 
analyzed in f4l for the special case of the problem where there is a single source that belongs to all of the 
terminal sets. That paper shows that in any single source laminar setting routing commodities in order of 
decreasing sizes of demand sets achieves an 0(log |n|) approximation where |n| is the number of different 
packets in the universe, and this factor is tight. 

In this paper we extend and improve the result of f4l to obtain a 2-approximation for the laminar demands 
setting with arbitrary terminal sets. Our approach is a hybrid of the two described above. At a high level, 
we first consider commodities in increasing order of the sizes of their demand sets. However, instead of 
committing to a single path for each commodity before considering the next, we keep around a collection of 
all possible near-optimal paths for the smaller demand sets before considering choices for the larger demand 
sets. Then in a second pass, we finalize a single path (tree) for each commodity, considering commodities 
in decreasing order of sizes of their demand sets. That is, we commit to paths for the larger demand sets 
before finalizing paths for the smaller demand sets. In order to maintain a collection of all near-optimal 
paths efficiently we use a primal-dual approach. The duals constructed for each commodity give a succinct 
description of all possible short paths connecting the source and the sink for that commodity. After having 
constructed all of the duals, we perform a reverse delete step that finalizes paths for commodities starting 
from the one with the largest demand and moving on to smaller demand sets. 

1.1 Related work 

The cost structure in RAND is uniform in the sense that costs on different edges are related through constant 
factors. Obtaining a randomized 0(log n) approximation for network design problems with a uniform cost 
structure is often easy: we can use the tree embeddings of Bartal [5J and Fakcharoenphol et al. [6| to convert 
the graph into a distribution over trees such that distances between nodes are preserved to within logarithmic 
factors in expectation. Then the expected cost of the optimal routing over the (random) tree is related within 
logarithmic factors to the cost of the optimal routing over the graph. Moreover, the problem is easy to solve 
on trees, because there is a unique path between every pair of nodes. 

As mentioned earlier, RAND is closely related but incomparable to other models of network design with 
uniform costs that display economies of scale. This includes, e.g., the uniform buy-at-bulk ll3l [T0l[T3l[T4l . 
rent-or-buy fTT/Ol, and access network design fT "81 problems. For all of these problems constant factor 
approximations are known in the uniform costs setting for the special case where all of the commodities 
share a common source. In the multi-commodity setting, i.e., with distinct sources and sinks, the rent-or- 
buy network design problem admits a 2-approximation llT2l l9l. but the buy-at-bulk network design problem 
is hard to approximate within poly-logarithmic factors |[I] . 



In order to model information networks, Hayrapetyan et al. lITTI considered a single-source network 
design problem in which the cost on an edge is a monotone submodular function of the commodities that 
use the edge. They obtain an 0(log n) approximation via tree embeddings [5 6|, where n is the number of 
vertices in the graph. Note that the cost structure in RAND is a special case of the one in lITTl (coverage 
functions are submodular). But, unlike lITTl . we consider arbitrary terminals sets and hence their setting 
does not generalize ours. In addition, we obtain a stronger approximation guarantee of 2. 

1.2 Problem statement 

The Redundancy Aware Network Design problem is defined as follows. We are given a graph G = {V, E) 
with costs Ce on edges, a universe IT of packets, and g terminal sets Xi, . . . ,Xg C V. The demand set 
of terminal set Xj is denoted Dj C 11, and we denote the collection of all demand sets as V. A solution 
consists of a collection of g Steiner trees T = {Ti , . . . , T^} where Tj is a Steiner tree spanning terminal set 
Xj. The trees specify how packets are to be routed over the edges: the packets of demand Dj are routed over 
edges of Tj. For a solution T, the load on edge e is (-e{T) = | Uieer ^j|' i-^- ^^'^ '^^'^^^ number of distinct 
packets being routed over edge elj Our goal is to find a T so as to minimize the total cost YleeE '^ef^e{T). 

In the laminar demands setting, the collection of demand sets is laminar: for any I?, D' € V, DnD' ^ 
imphes that either D 'Z D' or D' Q D. 

2 A primal-dual 2-approximation 

Our algorithm for the laminar demands case is an extension of the Goemans-Williamson primal-dual algo- 
rithm for the Steiner Forest Problem [7 1. We begin by defining the primal and dual linear programs. 

First, we transform our objective into a simpler form where the cost of each edge is charged to a col- 
lection of disjoint demand sets. In particular, given a solution T, for an edge e consider the demand sets 
D that are maximal among the collection {Dj : e € Tj} of demand sets that this edge carries. Because of 
laminarity, these maximal demand sets are disjoint, and so the load on the edge is simply the sum of the sizes 
of these demand sets. Accordingly, let us define H£){T) to be the set of edges e such that D is a maximal 
set in {Dj : e £ Tj}. D will contribute to the load on these edges. Then, we can write the total cost of the 
solution T as 

i{T) = Y.cMr) = Yl E c,\D\ = Y.\D\ E Ce = Y,\D\c{Hn{T)) 

e e D:HD(T)5e D eeHoiT] D 

Further note that in a feasible solution T, for each commodity j, the subgraph Uddd ^d{T) spans the 
terminal set Xj because the edges of Tj carry a superset of Dj. Therefore, instead of specifying a Steiner 
tree for each terminal set, it suffices to specify a forest Hd for each demand set D such that each terminal 
set Xj is connected Uddd ^d- 

In the linear program below, the variable Xg £> denotes whether e € Hf). We denote by 5{S) the set 
of edges crossing a cut S C V, and Sd the collection of cuts S (^ V that separates a terminal set Xj 
whose demand set Dj contains D. The cut constraints require that each terminal set Xj is connected by 



'More generally, we can define the load on an edge to be the total size of all of the distinct packets that an edge carries. Since 
our algorithm's performance or running time does not depend on the number of distinct packets, we may assume without loss of 
generality that all packets have equal size. 



minimize \J x^^d ■ \D\ce 

e,DeV 

subject to ^ ^ Xe,D' > 1 yn e'D,S gSd 

D'DDeeS(S) 



The corresponding dual linear program is as follows. 



maximize \J yD,5 

Dev,seSD 

subject to 2_, /_, yD',s < \D\ce \/e,D G V 



2.1 Algorithm 

The algorithm starts with a dual ascent stage in which it adds edges to forests {F£,}£,^x>, and ends with a 
pruning stage. In the following discussion, for a demand set D ^ V we say that S G Sd is a D-unsatisfied 
cut if Udod ^D' n ^(5*) = 0- We also say that an edge e is D-tight if 



Yl Yl yD',s 

D'CDSeS^r.eeSiS) 



\D\Ce. 



In the dual ascent stage, the algorithm raises duals in phases, one per demand set D G D in order of 
increasing size. In phase D, while there exists a D-unsatisfied cut it alternates between raising duals of the 
minimal D-unsatisfied cuts and adding D-tight edges to Fd- We say that S is an active set in the current 
iteration of the inner while loop if it is a minimal D-unsatisfied cut. The algorithm ensures that at the end of 
phase D, the edges Fd are paid for by the dual and Fo is a Steiner forest for terminal sets whose demand 
set contains D. In the pruning stage, the algorithm processes the demand sets in order of decreasing size 
and removes unnecessary edges from {-FdJ/j^x) and returns {H£)}d€V- 

We need the following lemma to prove that we can efficiently find active sets. 

Lemma 1. In any iteration in phase D, a set S is active if and only if it is a component of F^ and it 
separates a terminal set whose demand set contains D. 

Proof. Let S be an active set. By definition, S" is a minimal cut in Sd such that Udod ^D' H 5{S) = 0. 
Since S G Sd, it separates a terminal set whose demand set contains D. The algorithm processes the 
demand sets in increasing order of size, so we have Fd' = 9 for D' ^ D and thus Fd n 6{S) = 0. This 
implies that S'nC = 0orS'nC5Cfor every connected component C of Fd and so S* is a superset of a 
union of connected components of Fd- By minimality, we have that S* is a connected component of Fd. 

For the converse, consider a connected component S' of Fd that separates a terminal set whose demand 
set contains D. By definition, we have S' G Sd- Since S' is a connected component of Fo and Ff)/ = for 
D' 2 F), it is a minimal set in Sd such that Udod ^d' H 6{S) = 0. Therefore S' is an active set. D 



2.2 Analysis 

Our analysis follows along the lines of the analysis for the Goemans-Williamson algorithm. We first prove 
that the primal and dual solutions generated by the algorithm are feasible. 



Algorithm 1 Primal-Dual Algorithm for Laminar Buy-at-Bulk 



1 

2: 
3: 
4: 
5: 
6 
7: 
8: 
9 

10: 
11 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 



Initialize F^ ^ for all D G P and yD,5 ^ for all L> G P, 5 C F. 
(Dual ascent stage) 

for D G 2? in increasing order of size do 
(Start of phase D) 
while there exists a D-unsatisfied cut do 

Simultaneously raise y/) 5 for active sets S until some edge e goes I?-tight. 
Fd^Fd + e. 
end while 
(End of phase D) 
end for 

(End of dual ascent stage) 
(Pruning stage) 
Hd ^ Fd for all D eV. 
for D ^ V in decreasing order of size do 
for e G Hf) do 

if [Ho — e) U Ud'dd ^d' is a Steiner forest for terminal sets with demand set D then 

Hn^HD-er 
end if 
end for 
end for 

(End of pruning stage) 
return {Ho}d 



Lemma 2. The primal solution {-ffDJDeD cind the dual solution {yD,s}De'D,scv are feasible. 

Proof. We first prove that the primal solution is feasible. Consider an iteration during the pruning stage. 
We say that terminal set Xj is H -disconnected if it is disconnected with respect to edge set IJddd ^d and 
H-connected otherwise. We will show that all terminal sets are if -connected in all iterations of the pruning 
stage. 

Observe that at the end of phase D, there are no D-unsatisfied cuts and F^/ = for D' ^ D. Thus, all 
terminal sets with demand set D are connected with respect to edge set Fd. At the beginning of the pruning 
stage, we have Hd = Fo for all D ^ V, and so all terminal sets are ii-connected. Consider an iteration 
in which the algorithm deletes an edge e from Ho. By definition of ii-disconnected, this can only cause a 
terminal set with demand set D' C D to be ii-disconnected. However, the algorithm will not delete e if it 
causes a terminal set with demand set D to be //-disconnected. Now consider a demand set D' C D. Since 
\D'\ < \D\, we still have Hf)/ = Ff)/ so all terminal sets with demand set D' are ii-connected. Thus, all 
terminal sets are iZ-connected throughout the pruning stage and so {H£)}Dev is a feasible primal solution. 

The dual solution is feasible since the algorithm explicitly ensures that the dual variables in a tight 
constraint are not raised. D 

Next, we prove that in each phase D of the dual raising stage, the current active sets has average degree 
with respect to edges Udod ^D' (formally defined below) at most 2 in every iteration. This in turn implies 
that the primal solution has cost at most twice the total dual value. Since the dual is feasible, we have that the 
algorithm gives a 2-approximation. We bound the average degree of active sets by showing that Udod ^D' 
is a forest and that no inactive set has degree 1. 

Lemma 3. For all D ^ V, we have that Udod ^D' is a forest. 

Proof. Suppose, towards a contradiction, that the statement is false. Let D be a maximal demand set such 
that Udod ^D' contains a cycle C. By maximality, there exists e G C PI Ho. Since e is in a cycle in 
Ud'dd ^d, we have that {Hd — e) U Ud'dd ^D' is still a Steiner forest for terminal sets with demand set 
D. Thus, the algorithm would have removed e from Hd and so we have a contradiction. D 

For a subset of edges E' C E, let deg^;; (S) = \6{S) n E'\ denote the number of edges in E' that leave 

S. 

Lemma 4. Consider an iteration in phase D of the dual raising stage. Let S be a connected component of 
Fd in this iteration. If S ^ Sd, then J2d'dd ^^Sh , i^) ¥" 1- 

Proof. We prove the contrapositive. Suppose YId'dd'^^&h ,i^) = 1- Let e and A ^ D be the unique 
edge and demand set, respectively, such that e € Ha n 6{S). Since the algorithm did not delete e from Ha 
and Udoa ^D' is acyclic by Lemma|3l there exists Xj with Dj = A and u,v G Xj such that e is on the 
unique u — v path in Udoa ^D'- Since J2d'dd ^^S// / i'^) — 1' "^^e path crosses S exactly once. Thus, 
we have that S separates u, v and so S € Sa- By definition of Sd, we have Sa C Sd and this completes 
the proof of the lemma. D 

We are now ready to prove that the primal solution has cost at most twice the dual value. 
Lemmas. 'EoT.eeHo l^l^e < '^Y.D,syD,s- 



Proof. Using the fact that we only add tight edges, we have 



D eeHo D e&Ho \D'CD SeSjy,:eeS{S) 



D' SeSp, \DDD' 

E E yD',sdeg^^^^,HjS). 



D' SeSa, 

The second equality is obtained by rearranging, and the last follows from the fact that each edge is in Hd 
for at most one D 5 D'. 

Suppose that in an iteration in phase D', the dual for each active set is raised by A. Then we have that 
T^ses^, yD',s degu^3^, Ho (^) increases by A • ^^ ^^^^^ degy^^^, ^^ {S), and ^^^ yD,s increases by 
A • # of active sets. So it suffices to prove that in each phase D' and in each iteration within the phase, the 
average degree of active sets is at most 2: 



E '^'^^Uddd' Hd^S)<2-# of active sets. 



S active 

Fix an iteration in phase D' . Note that each active set correspond to some connected component of 
Fdi by Lemma [T] Let G' be a graph whose nodes are connected components of Fq/ and whose edge set is 
Uddd' ^d- The degree of a node in G' is equal to the degree of the corresponding set with respect to edge 
set Uddd' ^d- Let us say that a node of G' corresponding to an active set is an active node, and that any 
other node is inactive. We want to show that the average degree of active nodes in G' is at most 2. Suppose 
we remove all isolated nodes from G' . In the resulting graph, the degree of each inactive node is at least 2 
by Lemma m and the average degree is at most 2 by Lemma [3] Thus, the average degree of active nodes is 
at most 2. D 

Lemmas [2] and [5] gives us the following theorem. 

Theorem 6. Algorithm\J\is a 2-approximation for redundancy aware network design with laminar demands. 
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